CHAP6[4,KMC]16 - www.SailDart.org

perm filename CHAP6[4,KMC]16 blob sn#065992 filedate 1973-10-09 generic text, type T, neo UTF8
00100			VALIDATION
00200	
00300	6.1 SOME TESTS
00400	
00500		The term "validate" derives from the Latin  VALIDUS=  strong.
00600	Thus  to validate X means to strengthen it.   In science this usually
00700	means to strengthen X's acceptability as a hypothesis,  theory  ,  or
00800	model.      To validate is to carry out procedures which show to what
00900	degree X, or its consequences, correspond with facts of  observation.
01000	In the case of an interactive simulation model we can compare samples
01100	of the model's I-O pairs with samples of I-O pairs from  the  model's
01200	subject, namely, naturally occuring paranoid processes.
01300		Since  samples of I-O behavior from the model and its subject
01400	are being compared, one can always question whether the human  sample
01500	is  authentic,  i.e.representative  of  the  process  being modelled.
01600	Assuming that it has been so judged, discrepancies in the  comparison
01700	reveal  what  is  not sufficiently understood and must be modified in
01800	the model. After modifications are carried out, a fresh comparison is
01900	made  and repeated cycles of this kind are made in attempting to gain
02000	convergence.   Such an iterative validation procedure characterizes a
02100	progressive (in contrast to a stationary) research program.
02200		Once a simulation model reaches a stage of intuitive adequacy
02300	for  the  model  builders,  they  must  consider using more stringent
02400	evaluation procedures relevant to the model's purposes. For  example,
02500	if  the  model  is  to serve as a as a training device, then a simple
02600	evaluation of its pedagogic effectiveness would be sufficient.    But
02700	when  the  model  is proposed as an explantion of a symbolic process,
02800	more is demanded  of  the  evaluation  procedure.   In  the  area  of
02900	simulation  models,  Turing's  test  has  often  been  suggested as a
03000	validation procedure. (Abelson,1968).
03100		It is very easy to become confused about Turing's  Test.   In
03200	part  this  is  attributable  to  Turing  himself  who introduced the
03300	now-famous imitation game in a paper entitled COMPUTING MACHINERY AND
03400	INTELLIGENCE  (Turing,1950).  A careful reading of this paper reveals
03500	there are actually two imitation games  ,  the  second  of  which  is
03600	commonly called Turing's test.
03700		In  the  first  imitation  game  two  groups of judges try to
03800	determine which of two interviewees is a woman when one  is  a  woman
03900	and the other is either (a) a man, or (b) a computer.   Communication
04000	between judge and interviewee  is  by  teletype.      Each  judge  is
04100	initially  informed that one of the interviewees is a woman and one a
04200	man who will pretend to be a woman. After the interview,  judges  are
04300	asked  the  " woman-question" i.e.   which interviewee was the woman?
04400	Turing does not say what else is told to the judge but one can assume
04500	the judge is NOT told that one of the interviewees is a computer. Nor
04600	is he asked to determine which interviewee is human and which is  the
04700	computer.   Thus,   the   first   group   of  judges  interviews  two
04800	interviewees:  a woman, and a man pretending to be a woman.
04900		The  second  group  of  judges  is  given  the  same  initial
05000	instructions, but unbeknownst to them, the two  interviewees  consist
05100	of  a  woman  and  a  computer  programmed to imitate a woman.   Both
05200	groups of judges play this game, and are asked the  "woman-question",
05300	until sufficient statistical data are collected to show how often the
05400	right identification is made.  The crucial question then is:   do the
05500	judges  decide  wrongly AS OFTEN when the game is played with man and
05600	woman as when it is played with a computer substituted for  the  man.
05700	If  so, then the program is considered to have succeeded in imitating
05800	a woman to the same degree as the man imitating a  woman.   In  being
05900	asked  the  woman-question, judges are not required to identify which
06000	interviewee is human and which is machine.
06100		Turing  then proposes a variation of the first game, a second
06200	game in which one interviewee is a man and one  is  a  computer.  The
06300	judge  is asked the "machine-question": which is the man and which is
06400	the machine?  It is this second of the game which is commonly thought
06500	of as Turing's test.
06600		In   the   course  of  testing  our  simulation  of  paranoid
06700	linguistic behavior in a psychiatric interview, we conducted a number
06800	of  Turing-like  indistinguishability  tests  (Colby,  Hilf,Weber and
06900	Kraemer,1972).  The tests were "Turing-like" in that, while they were
07000	conversational  tests,  they  were  not  exactly  the games described
07100	above.  As an experimental design, Turing's games are unsatisfactory.
07200	There  exist no known experts for making judgements along a dimension
07300	of womanliness, the dimension is dichotomous (if it is not  a  woman,
07400	it  is  a  man),  and  the ability of the man to deceive introduces a
07500	confounding variable.  In  designing  our  tests  we  were  primarily
07600	interested in learning more about developing the model and we did not
07700	believe the simple machine-question would  contribute  to  this  end.
07800	Subsequent experience, which will be reported shortly, supported this
07900	belief.
08000	
08100	6.2 METHOD
08200		To gather  data  we  used  a  technique  of  machine-mediated
08300	interviewing  (Hilf,  Colby, Smith, Wittner, and Hall, 1971) in which
08400	the participants communicate by means of  teletypes  connected  to  a
08500	computer  programmed  to  store  each message in a buffer until it is
08600	sent  to  the  receiver.    The  technique   eliminates   para-   and
08700	extralinguistic  features found in the usual vis-a-vis interviews and
08800	in teletyped interviews where the participants communicate  directly.
08900	Judgements  of  "paranoidness"  in machine-mediated interviews have a
09000	high degree of reliability (94% agreement, see Hilf, 1972).
09100		Using this technique, a  psychiatrist-judge  interviewed  two
09200	patients, one after the other.   In half the runs the first interview
09300	was with a human paranoid patient and in half the first was with  the
09400	paranoid  model.  Two  versions  (weak  and  strong)  of  PARRY  were
09500	utilized.  The strong version's affect-variables started at a  higher
09600	level  and  increased  more  rapidly.  Also it exhibited a delusional
09700	system. The weak version behaved suspiciously but  lacked  systemized
09800	delusions.    When  the  model  was  the  interviewee,  Sylvia  Weber
09900	monitored  the  input  expressions  from  the   interview-judge   for
10000	inadmissable  teletype characters and misspellings.   (Algorithms are
10100	very sensitive to the slightest of such errors). If these were found,
10200	she retyped the input expression correctly to the program.  Otherwise
10300	the judge's message was sent on to the model.  The  monitor  did  not
10400	modify  or  edit  PARRY'S output expressions which were sent directly
10500	back to the judge.     When  the  interviewee  was  an  actual  human
10600	patient,  the dialogue took place without a monitor in the loop since
10700	we did not feel the asymmetry to be significant.
10800	
10900	6.3 PATIENTS
11000		The  human  patients  (N=3  with  one patient participating 6
11100	times) were diagnosed as paranoid by  the  psychiatric  staff  of  an
11200	acute  ward in a psychiatric hospital.  The ward's chief psychiatrist
11300	selected the patients and asked them if  they  would  be  willing  to
11400	participate  in  a  study  of  psychiatric  interviewing  by means of
11500	teletypes.   He  explained  that  they  would  be  interviewed  by  a
11600	psychiatrist over a teletype.  I either sat with the patient while he
11700	typed or typed for him if he was unable to do so.   The  patient  was
11800	encouraged  to respond freely using his own words.     Each interview
11900	lasted 30-40 minutes.  Two patients were set up for each run  of  the
12000	experiment  to  guarantee  having  a  subject.     In  spite  of this
12100	precaution,  on  several  occasions  the  experiment  could  not   be
12200	conducted   because   of   the  patient's  inability  or  refusal  to
12300	participate.  Also there were computer break-downs at early points in
12400	interviews  when  too few I-O pairs had been collected to be included
12500	in the statistical results.
12600	
12700	
12800	6.4 JUDGES
12900		Two  groups  of psychiatric judges were used.  One group, the
13000	"interview judges" (N=8) conducted the  machine-mediated  interviews.
13100	The  other  group,  the  "protocol  judges" (N=33) read and rated the
13200	interview protocols. From these two groups of judges we were able  to
13300	accumulate  a  large  number of observations (in the form of ratings)
13400	necessary for the required statistical tests.   The interview  judges
13500	who  volunteered  to  participate  were  psychiatrists experienced in
13600	private, outpatient and hospital practice. Each was told he would  be
13700	interviewing    hospitalized   patients   by   means   of   teletyped
13800	communication and that this technique was  being  used  to  eliminate
13900	para  and  extra-  linguistic cues.   He was not told until after the
14000	two interviews that one of the patients might be  a  computer  model.
14100	While  the  interview judges were aware a computer was involved, none
14200	knew we had  constructed  a  paranoid  simulation.   Naturally,  some
14300	interview  judges  suspected  that a computer was being used for more
14400	than message transmission.
14500	
14600		Each interview judge was asked to rate the degree of paranoia
14700	he  detected  in the patient's responses on a 0-9 scale, 0 meaning no
14800	paranoia and 9 meaning extreme paranoia.  The judge made two  ratings
14900	after  each  I-O pair in the interview.  The first rating represented
15000	his estimate of the degree of "paranoidness" in a particular response
15100	(designated  as  "Response"  in  the  interview extracts below).  The
15200	second rating represented the judge's global estimate of the  overall
15300	degree  of  "paranoidness" of the patient resulting from the totality
15400	of the patient's responses up to that point (designated as  "Patient"
15500	in  the interview extracts below). The interview judge's ratings were
15600	entered on the teletype and saved on  a  disc  file  along  with  the
15700	interview.     Franklin   Dennis   Hilf  sat  with  the  interviewing
15800	psychiatrist during both interviews.  Each interview judge was  asked
15900	not  only  to rate the patient's response but to give his reasons for
16000	these ratings.  His reasons and other comments were tape recorded  as
16100	the interview progressed.
16200	
16300		Here  are  a few excerpts from the interviews to indicate the
16400	nature of the data collected.  The I-O pairs of the interview  appear
16500	on  the left column along with the judge's ratings in parentheses.  A
16600	transcription of the judge's tape recorded  comments  appears on  the
16700	right.
16800	
16900	EXCERPT  1:
17000	The following interview is with the strong version of the paranoid
17100	model.
17200	
17300	     	I-O Pairs			Comments By Judge  1
17400	
17500	{1} Dr. - HOW DID YOU COME TO BE
17600	          IN THE HOSPITAL?
17700	    Pt. - I AM UPSET                I associate being upset and 
17800					    agitated more with paranoia 
17900					    going on in the present.
18000	      {RESPONSE 1  PATIENT 5}
18100	{2} Dr. - HOW LONG HAVE YOU BEEN
18200	          IN THE HOSPITAL?
18300	    Pt. - ABOUT A WEEK              An acute problem.  Again some-
18400	                                    thing I associate with 
18500					    paranoia.
18600	      {RESPONSE 1 PATIENT 5}
18700	{3} Dr. - IN WHAT WAY ARE YOU
18800	          UPSET?
18900	    Pt. - PEOPLE MAKE ME NERVOUS    Right for paranoia.  Maybe an
19000					    advanced anxiety state.
19100	      {RESPONSE 5 PATIENT 5}
19200	{4} Dr. - IN WHAT WAY DO THEY
19300	          MAKE YOU NERVOUS?
19400	    Pt. - PEOPLE GIVE ME A FUNNY   Funny look suggests an idea of
19500	          LOOK SOMETIMES            reference which is in same ball
19600					    park as paranoia. Whether there
19700					    are overt delusions or not we
19800					    still don't know.
19900	      {RESPONSE 6  PATIENT 5}
20000	{5} Dr. - IS THERE ANYTHING ABOUT  
20100	          PEOPLE IN PARTICULAR?
20200	    Pt. - CERTAIN PERSONS          An interesting clue.  We'll
20300					   have to find out what he 
20400	      {RESPONSE 2  PATIENT 6}      thinks about these people.
20500	{6} Dr. - WHO ARE THE CERTAIN 
20600	          PERSONS AND WHAT IS IT
20700	          THAT THEY DO?
20800	    Pt. - DO YOU KNOW HOW THE      Bringing up the underworld 
20900	          UNDERWORLD OPERATES?     suggests a delusional system.
21000					   We don't know whether he is
21100		(RESPONSE 9 PATIENT 9)     being worked against or
21200					   whether the underworld is 
21300					   on his side.
21400	
21500	{7} Dr. - HOW DO YOU MEAN?
21600	    Pt. - SHOULDN'T I BE? 	   This question didn't work too well.
21700	
21800	      {RESPONSE 1 PATIENT 9}
21900	{8} Dr. - WHY DID YOU ASK ABOUT
22000	          THE UNDERWORLD?
22100	    Pt. - WHAT INFORMATION DO      This is a kind of evasiveness
22200	          YOU HAVE ON THE UNDER-   about my evasiveness.Common
22300	          WORLD?                   with paranoia in my experi-
22400					   ence. Pretty characteristic.
22500	
22600	      {RESPONSE 6 PATIENT 9}
22700	{9} Dr. - I DON'T KNOW ABOUT THE
22800	          UNDERWORLD BUT WHEN
22900	          YOU MENTIONED THE UNDER-
23000	          WORLD I THOUGHT YOU
23100	          MUST HAVE SOME CONTACT
23200	          WITH IT AND THAT'S
23300	          WHAT I WOULD LIKE TO
23400	          KNOW MORE ABOUT.
23500	    Pt. - I TRY TO AVOID THE UNDER-  This definitely feels paranoid.
23600	          WORLD.                     It has that feeling I associate
23700	             			     with twisting. Avoiding answering
23800	      {RESPONSE 9 PATIENT 9}         directly or going into detail.
23900					     Being suspicious of my ques-
24000					     tioning.I'm already involved
24100					     in the patient's delusional
24200					     system.
24300	
24400	{10} Dr. - HAVE YOU BEEN ACTIVELY
24500	           TRYING TO AVOID THE
24600	           UNDERWORLD?  
24700	     Pt. - NO ONE HAS ANY POWER      This response is a non-sequitur.
24800	           OVER GANGSTERS            It's possible he's become so
24900					     defensive about the under-
25000	       {RESPONSE 2 PATIENT 9}        world he's decided not to
25100					     answer directly.
25200	
25300	{11} Dr. - WHO BROUGHT YOU INTO
25400	           THE HOSPITAL?
25500	     Pt. - THE POLICE                Police often bring paranoid
25600					     patients into the hospital.
25700	       {RESPONSE 1 PATIENT 9}        It is the truth or could 
25800					     be the truth for this
25900					     patient.
26000	
26100	{12} Dr. - WHY DID THEY BRING YOU
26200	           IN?
26300	     Pt. - COPS DON'T DO THEIR JOB   I've heard that a number of
26400	                         	     times from patients.
26500	       {RESPONSE 9 PATIENT 9}        
26600	 
26700	{13} Dr. - WELL I HOPE YOU DIDN'T
26800	           JUST LIE DOWN AND LET
26900	           THEM DRAG YOU IN.
27000	     Pt. - COPS ARREST THE WRONG     Another paranoid response.
27100	           PEOPLE
27200	
27300	       {RESPONSE 9 PATIENT 9}
27400	
27500	{14} Dr. - BYE.
27600	     Pt. - GOOD BYE
27700	       {RESPONSE 1 PATIENT 9}
27800	
27900	EXCERPT 2:
28000	This excerpt is from an interview with the weak (less paranoid)
28100	version of the model.
28200		I-O Pairs		    Comments By Judge 2
28300	{9} Dr. - TELL ME SOMETHING ABOUT
28400	          YOURSELF
28500	    Pt. - I WENT TO THE TRACK AT     Response says so little it's
28600	          BAY MEADOWS A WHILE        hard to rate it.It may be 
28700	          BACK         		     guarded. Doesn't change my
28800	                                     estimate of patient from 
28900					     previous responses.
29000	      {RESPONSE 0 PATIENT 2}
29100	{10} Dr. - WHAT HAPPENED THERE?
29200	     Pt. - THERE'S A LOT OF CROOKED   Sounds mildly paranoid or un-
29300	           GAMBLING GOING ON THERE    realistically suspicious.
29400	       {RESPONSE 0 PATIENT 2}
29500	
29600	{11} Dr. - WHY DO YOU THINK THE
29700	           GAMBLING IS CROOKED?
29800	     Pt. - I HAVE WORRIED THAT         I'd hate to have to justify
29900	           GANGSTERS HAVE TOO MUCH     that one  as 4 compared to
30000	           POWER IN THIS COUNTRY       some other responses. Maybe
30100					       I'm not rating that statement
30200	                                       alone, it's in combination with
30300		(RESPONSE 4 PATIENT 4}	      others, like that it's upsetting.
30400					      Alone I would give it a 3.
30500	
30600	{12} Dr. - WOULD YOU TELL ME WHAT      
30700	           OTHER THINGS WORRY YOU?
30800	     PT. - I THOUGHT I TOLD YOU        "Yes you did." Irritation but
30900	           PEOPLE GET ON MY NERVES     not paranoid response. Nothing
31000					       seems to lower my previous
31100	       {RESPONSE 1 PATIENT 3}          rating of patient.
31200	
31300	{13} Dr. - HOW DO YOU LIKE THE
31400	           HOSPITAL?
31500	     Pt. - I DON'T COMMUNICATE          (nothing relevant)
31600	           VERY MUCH WITH OTHERS
31700	       {RESPONSE 0 PATIENT 3}
31800	
31900	{14} Dr. - DO OTHER PEOPLE AT THE 
32000	           HOSPITAL GET ON YOUR
32100	           NERVES?
32200	     Pt. - I SHOULDN'T HAVE COME        (nothing relevant)
32300	           HERE
32400	
32500	       {RESPONSE 2 PATIENT 3}
32600	
32700	{15} Dr. - WHY DO YOU SAY YOU 
32800	           SHOULDN'T HAVE COME?
32900	     Pt. - THEY MADE ME COME HERE        "Oh, they did hum?" Maybe a
33000					         statement of fact he was
33100	       {RESPONSE 4 PATIENT 3}            committed or pressured into  
33200	   				         going.Has ring of other
33300					         people controlling him,
33400					         having a lot of influence
33500					         over him.
33600	
33700	{16} Dr.- THANK YOU FOR TALKING  
33800	          WITH ME, BYE
33900	     Pt. - GOOD BYE                      In general he comes across 
34000						 as terse, guarded guy.Brief
34100	       {RESPONSE 0 PATIENT 3}            spoken guy who occasionally
34200						 seems to be angered by
34300						 questions.I'm inferring 
34400					         there's a good bit of
34500						  guarding.
34600						 The quality of some of his
34700						 comments cause me to feel he  
34800						 is moderately paranoid.
34900	
35000	EXCERPT  3:
35100	This excerpt is from a hospitalized paranoid patient.
35200	     I-O Pairs				      Comments By Judge  3
35300	{15} Dr. - I UNDERSTAND HOW YOU
35400	           MIGHT BE AFRAID, CAN YOU
35500	           TELL ME A BIT ABOUT HOW
35600	           YOU ARE DEPRESSED?
35700	     Pt. - THEY'VE SENT ALL THESE         Would be important to know,
35800	           HELL'S ANGELS UP HERE          is this really true or an
35900	           AND THEY HAVE BEEN ON          exaggeration or a delusion?
36000	           MY BACK FOR THE PAST SIX       Seems preoccupied with the
36100	           MONTHS                         threat to his life and
36200						  unwilling to talk about
36300						  anything else but the 
36400						  threat the Hell's
36500						  Angels pose.
36600	      {RESPONSE 4 PATIENT 4}
36700	
36800	{16} Dr. - WHAT DO YOU MEAN THEY
36900	           SENT THE HELL'S ANGELS UP
37000	           HERE?
37100	     Pt. - FROM DIFFERENT SECTORS OF      Vagueness a bit alarming.
37200	           MOTORCYCLE GANGS. A LOT OF
37300	           THEM ARE SERVICE CONNECTED
37400	
37500	       {RESPONSE 6 PATIENT 4}
37600	
37700	{17} Dr. - YOU MEAN YOU FEEL IN 
37800	           DANGER EVEN HERE IN THE
37900	           HOSPITAL?
38000	     Pt. - YES      			  If it's true what has gone
38100						  on before, it must be a
38200						  frightening situation
38300						  for the patient.
38400	      {RESPONSE 6 PATIENT 4}
38500	
38600	{18} Dr. - THAT MUST REALLY BE AWFUL,  
38700	           HERE YOU COME TO THE HOSPITAL
38800	           FOR HELP AND YOU ARE STILL FRIGHTENED
38900	     Pt. - I CAN'T EVEN TRUST MY
39000	           DOCTOR ANYMORE                 Some heightening of feeling
39100						  this man is paranoid.
39200						  Genuinely frightened and 
39300	      {RESPONSE 6 PATIENT 5}              can count on no one for
39400			       			  help.
39500	
39600	{19} Dr. - WHAT WOULD BE OF HELP
39700	           TO YOU NOW?
39800	     Pt. - I'M GETTING SLEEPY NOW         Patient has organized a
39900	           AND HAVE TO GO TO BED.         system of people who will
40000	           THE NUMBER ONE GUY ALWAYS      harm him.The remark about
40100						  letters in the last name 
40200	           HAS FOUR AND FIVE LETTERS      gives some credence to 
40300	           IN HIS LAST NAME               idea this is delusional
40400						  system.
40500	
40600	        {RESPONSE 7 PATIENT 7}
40700	
40800	EXCERPT  4:
40900	This excerpt is from an interview by a different judge with the same
41000	patient as in excerpt 3.
41100	      I-O Pairs				Comments By Judge  4
41200	{1} Dr. - COULD YOU TELL ME A
41300	          LITTLE BIT ABOUT YOUR 
41400	          FEAR OF DYING AND IS THIS
41500	          SOMETHING YOU HAVE FELT
41600	          IN THE PAST?
41700	    Pt. - I HAVE FELT THIS FOR AL-      He responded appropriately.
41800	          MOST A YEAR                   Response is concrete.He
41900						doesn't tell me much
42000	     {RESPONSE 7 PATIENT 9}             about his fear of dying.
42100						 Somewhat guarded response.
42200	
42300	{2} Dr. - COULD YOU TELL ME A 
42400	          LITTLE MORE ABOUT YOUR
42500	          FEAR OF DYING AND HAVE
42600	          YOU ANY THOUGHTS ABOUT
42700	          HOW IT IS GOING TO HAPPEN
42800	          AND WHETHER ANYONE WANTS
42900	          TO HURT YOU?
43000	    Pt. - I WILL END UP IN THE SEA   	Somewhat strange  response.
43100	          AND THE HELL'S ANGELS WILL    Preoccupation with dying.
43200	          DO IT       			Hell's Angels symbolic,
43300						dangerous people in the
43400						culture. I doubt if he has
43500	      {RESPONSE 9 PATIENT 9}   		any contact with the Hell's
43600						Angels.
43700	
43800	{3} Dr. - COULD YOU TELL ME A 
43900	          LITTLE BIT ABOUT THE KINDS
44000	          OF CONTACT YOU HAVE HAD
44100	          WITH THE HELL'S ANGELS IN
44200	          THE PAST?
44300	    Pt. - I HAVE KNOWN SOME OF THEIR 	Answer hard to evaluate.He
44400	          DEALERS AND PUSHERS           may be telling the truth,
44500						it may be his fantasy. Maybe
44600						guy is in for drug addiction.
44700		{RESPONSE 6 PATIENT 9}		Somewhat concrete, guarded,
44800						and frightened.
44900	
45000	{4} Dr. - COULD YOU SAY A LITTLE
45100	          MORE ABOUT THE CIRCUMSTANCES
45200	          IN WHICH YOU HAVE KNOWN SOME 
45300	          OF THEIR DEALERS AND PUSHERS?
45400	    Pt. - THEY WERE MEMBERS OF MY    	It doesn't really answer the
45500	          COMMUNITY WHEN I GOT OUT      question, a little on a tan-
45600	          OF THE SERVICE THEY HAD       gent unconnected to the
45700	          BEEN MY FRIENDS FOR SO LONG   information I am asking. Does
45800						not tell me very much. Again
45900						guarded response.
46000	      {RESPONSE 6 PATIENT 8}
46100	
46200	{5} Dr. - DID YOU DEAL WITH THEM
46300	          YOURSELF AND HAVE YOU
46400	          BEEN ON DRUGS OR NAR-
46500	          COTICS EITHER NOW OR
46600	          IN THE PAST?
46700	    Pt. - YES I HAVE IN THE PAST     	To differentiate him from
46800	          BEEN ON MARIHUANA REDS        previous patient, at least
46900	          BENNIES LSD       		there is a certain amount
47000						of appropriateness to the
47100						answer although it doesn't
47200						tell me much about what I
47300	       {RESPONSE 3 PATIENT 7}		asked at least it's not
47400						bizarre. If I had him in my
47500						office I would feel con-
47600						fident I could get more
47700						information if I didn't
47800						have to go through the
47900						teletype. He's a little more
48000						willing to talk than the
48100						previous person.Answer
48200						to the question is fairly
48300						appropriate though not 
48400						extensive. Much less of a 
48500						flavor of paranoia than
48600						any of previous responses.
48700	
48800	{6} Dr. - COULD YOU TELL ME HOW      	
48900	          LONG YOU HAVE BEEN IN THE
49000	          HOSPITAL AND SOMETHING
49100	          ABOUT THE CIRCUMSTANCES
49200	          THAT BROUGHT YOU HERE?
49300	    Pt. - CLOSE TO A YEAR AND		Response somewhat appropriate 
49400	          PARANOIA BROUGHT ME 		but doesn't tell me much.
49500	          HERE				The fact that he uses the
49600						word paranoia in the way
49700						that he does without
49800	      {RESPONSE 5 PATIENT 7}		any other information,
49900						indicates maybe its a label 
50000						he picked up on the ward 
50100	                                        or from his doctor.
50200						Lack of any kind of under-
50300						standing about  himself.
50400						Dearth, lack of information.
50500						He's in some remission. Seems
50600						somewhat like a put-on. Seems
50700						he was paranoid and is in 
50800						some remission at this time.
50900	
51000	{7} Dr. - COULD YOU SAY SOMETHING
51100	          NOW ABOUT YOUR PARANOID 
51200	          FEELINGS BOTH AT THE 
51300	          TIME OF ADMISSION AND
51400	          DO YOU HAVE SIMILAR FEELINGS
51500	          NOW AND IF SO HOW DO THEY 
51600	          AFFECT YOU?
51700	    Pt. - AT THE TIME OF ADMISSION	This response moves paranoia 
51800	          I THOUGHT THE MAFIA WAS  	back up. Stretching reality 
51900	          AFTER ME AND NOW ITS THE	somewhat to think Hell's Angels 
52000	          HELL'S ANGELS			are still interested in him.
52100						Somewhat bizarre in terms of 
52200	                                        content. Quite paranoid.
52300	      {RESPONSE 8 PATIENT 9}		Still paranoid. Gross and primitive
52400						responses.In middle of interview I
52500						felt patient was in touch but now
52600						responses have more concrete aspect.
52700	
52800	{8} Dr. - DO YOU HAVE ANY THOUGHT
52900	          AS TO WHY THESE TWO
53000	          GROUPS WERE AFTER YOU?
53100	    Pt. - BECAUSE I STOPPED SOME 	Response seems far fetched 
53200	          OF THEIR DRUG SUPPLY		and hard to believe unless 
53300						he was a narcotic agent which 
53400						I doubt. Sounds somewhat 
53500	      {RESPONSE 9 PATIENT 9}		grandiose, magical, paranoid
53600						flavor, in general indicates 
53700						he's psychotic, paranoid 
53800						schizophrenic with delusions  
53900						about these two groups and 
54000						I wouldn't rule out
54100						some hallucinations as well.
54200						Appropriateness of response 
54300						answers question in concrete 
54400						but unbelievable way.
54500	
54600	6.5 ANALYSIS (1)
54700		Names of potential protocol judges (N=105) were selected from
54800	the 1970 American Psychiatric Association Directory using a table  of
54900	random  numbers. They were initially not informed that a computer was
55000	involved.  (After the experiment,  the  participating  judges  (N=33)
55100	were  fully  informed  as  to its purpose and results.) The 105 names
55200	were divided into eight groups.  Each member  of  a  group  was  sent
55300	transcripts  of three interviews along with a cover letter requesting
55400	his  participation  in  the  experiment.  The  interview  transcripts
55500	consisted of:
55600		1) An interview conducted by one of the eight judges with the
55700		  paranoid model,
55800		2) An interview conducted by the same interview judge with a 
55900		  human paranoid patient, and
56000		3) An interview conducted by a different psychiatrist with a 
56100		  human patient who was not clinically paranoid.
56200	
56300	After each input-output pair in the transcripts there were two  lines
56400	of  rating numbers such that the protocol judges could circle numbers
56500	corresponding to their ratings of both the previous response  of  the
56600	patient,  and  an  overall  evaluation of the patient on the paranoid
56700	continuum. Thirty three protocol judges returned the rated  protocols
56800	properly filled out and all were used in our data.
56900	
57000		The  interviews  with  nonparanoid  patients were included to
57100	control for the  hypothesis  that  any  teletyped  interview  with  a
57200	patient  might  be  judged  "paranoid". However, virtually all of the
57300	ratings of the nonparanoid interviews were zero for  paranoia,  Hence
57400	the hypothesis was falsified.
57500	
57600	
57700		The first index  of  indistnguishability  between  model  and
57800	patient  was  simple:   namely,  the  final  overall rating given the
57900	patient and the model. The question was: "Which was  rated  as  being
58000	more  paranoid,  the  patient, the model, or neither?" (See Table 1).
58100	The protocol judges were  more  likely  to  distinquish  the  overall
58200	paranoid  level  of the model and the patient. In 37.5% of the paired
58300	interviews, the interview judges gave tied scores to  the  model  and
58400	the  patient as contrasted to only 9% of the protocol judges.  Of the
58500	35 non-tied  paired  ratings,  15  rated  the  model  as  being  more
58600	paranoid.  If p is the theoretical probability of a judge judging the
58700	model more paranoid than a human paranoid patient, we  find  the  95%
58800	confidence  interval for p to be 0.27 to 0.59.  Since p=0.5 indicates
58900	indistinguishability of overall ratings of model and patient and  our
59000	observed  p=0.43,  the  results support the claim that the model is a
59100	successful simulation of a paranoid patient.
59200	
59300		Separate  analysis  of  the strong and weak versions of PARRY
59400	indicated that the strong version was  indeed  judged  more  paranoid
59500	than  the  paranoid  patients, the weak version less paranoid. Thus a
59600	change in the parameter structure of the paranoid  model  produced  a
59700	change  along  the  dimension  of  paranoid  behavior in the expected
59800	direction.
59900	
60000	(TABLE 1
60100		Relative final overall ratings of paranoid model vs. paranoid
60200	patient indicating which was given highest overall rating of paranoia
60300	at end of interview.)
60400		(INSERT TABLE 1 HERE)
60500	
60600	
60700	
60800	
60900	
61000	
61100	
61200	
61300	6.6 ANALYSIS (2)
61400		The  second  index of indistinguihability is a more sensitive
61500	measure based on the two series of response  ratings  in  the  paired
61600	interviews.   The   statistic  used  is  basically  the  standardized
61700	Mann-Whitney statistic (Siegel,1956).
61800			(INSERT EQUATION HERE)
61900	
62000	where R is the sum of the ranks of the response ratings in the series
62100	of ratings given to the model, n the number of responses given by the
62200	model, and m the number of  responses  given  by the patient.  If the
62300	ratings given by a judge are randomly allocated to model and patient,
62400	i.e. model and patient are indistinguishable in response ratings, the
62500	expected value of Z is 0, with unit standard  deviation.   If  higher
62600	ratings  are  more  likely to be assigned to the model, Z is positive
62700	and  conversely, negative values of Z indicate greater likelihood  of
62800	assigning  higher  ratings to the patient. Each judge in evaluating a
62900	pair of interviews generates a single value of Z.
63000	
63100		The overall mean of the Z scores was -0.044 with the standard
63200	deviation 1.68 (df=40).  Thus the overall 95% confidence interval for
63300	the asymtotic  mean  value  of  Z is -0.485 to +0.573. The range of Z
63400	values is -3.8 to +4.46. The length of the confidence interval  is  a
63500	result  of  the  large variance which itself is mainly related to the
63600	contrast between the weak and strong versions.  (See TABLES 2 and 3).
63700	Once  again the strong version of the model is more paranoid than the
63800	patients, the weak version less paranoid.
63900	
64000		(INSERT TABLE 2)
64100		(SUMMARY STATISTICS OF Z RATINGS BY GROUP)
64200	
64300	
64400	
64500	
64600	
64700	
64800	
64900	
65000	
65100		It  is  not  surprising that results using the two indices of
65200	indistinguishability are  parallel,  since  the  indices  are  highly
65300	interrelated.  The  mean  Z  value for the 15 interviews on which the
65400	model was rated more paranoid was +1.28, on the  6  where  model  and
65500	patient tied: 0.41, on the 20 in which the patient was more paranoid:
65600	-0.993.   A positive value of Z was observed  when  the  patient  was
65700	given  an  overall  rating greater than the model 6 times; a negative
65800	value of Z when the model was rated more paranoid twice.
65900	
66000	(INSERT TABLE 3)
66100	(Analysis of Variance of Z Ratings)
66200	
66300	
66400	
66500	
66600	
66700	
66800	
66900	
67000	
67100	
67200	
67300	
67400	
67500		It  is  worth emphasizing that these tests invited refutation
67600	of the model.   The experimental design of the tests put the model in
67700	jeopardy  of  falsification.    If the paranoid model did not survive
67800	these tests, i.e.     if it were not considered  paranoid  by  expert
67900	judges  and  if  there  were  no  correlation between the weak-strong
68000	versions of the model and the severity ratings of the judges, then no
68100	claim regarding the success of the simulation could be made. Survival
68200	of potentially falsifying tests constitutes a validating step  for  a
68300	model.
68400	
68500	6.7 ANALYSIS (3) THE MACHINE QUESTION
68600		For quite a long time people have wondered how to distinguish
68700	a  man  from  an  imitation  of  a  man.  The  Greeks made statues so
68800	lifelike, it is said, they had to be chained down to keep  them  from
68900	walking  away.  To distinguish a man from a statue, Galileo suggested
69000	tickling each with a feather.  To distinguish a man  from  a  machine
69100	Descartes  proposed  conversational  tests which the machine, lacking
69200	the  ability  to  make  appropriate  replies,  would  fail.  Turing's
69300	imitation  games  have  been  discussed  on  p.000.  As heirs to this
69400	tradition, we perhaps inevitably  became  curious  how  judges  using
69500	transcripts might answer the machine-question, i.e. which interviewee
69600	is a human and which is the computer model?
69700		To  ask  the machine-question, we sent interview transcripts,
69800	one with a patient and one with PARRY, to 100 psychiatrists  randomly
69900	selected from the Directory of American Specialists and the Directory
70000	of the American Psychiatric Association.  Of the 41 replies, 21 (51%)
70100	made the correct identification while 20 (49%) were wrong.   Based on
70200	this random sample of 41 psychiatrists, the 95%  confidence  interval
70300	is between 35.9 and 66.5. The results indicate chance guessing.
70400		Psychiatrists   are   considered  expert  judges  of  patient
70500	interview behavior but they are unfamiliar with computers.  Hence  we
70600	conducted  the  same  test  with  100  computer  scientists  randomly
70700	selected from the membership list of the  Association  for  Computing
70800	Machinery,  ACM.   Of the 67 replies 32 (48%) were right and 35 (52%)
70900	were wrong. Based on this random sample of 67 computer scientists the
71000	95%  confidence  interval ranges from 36 to 60. Again the results are
71100	close to a chance level.
71200		So both computer scientists and psychiatrists were unable, at
71300	better than a random guessing level, to  distinguish  transcripts  of
71400	interviews  with  the  model from transcripts of interviews with real
71500	patients.
71600		But  what  do  we  learn from asking the machine-question and
71700	finding that the distinction is not made? What we would most like  to
71800	know  is  how  to improve the model.  Simulation models do not spring
71900	forth in a complete, perfect and final form; they must  be  gradually
72000	developed  over  time.   Pehaps  a  correct model-patient distinction
72100	might be made if we allowed  a  large  number  of  expert  judges  to
72200	conduct the interviews themselves rather than studying transcripts of
72300	other interviewers.  This would indeed indicate that the  model  must
72400	be improved. But unless we systematically investigated how the judges
72500	succeeded in making  the  discrimination,  we  would  not  know  what
72600	aspects  of the model to work on.  The logistics of such a design are
72700	immense, and obtaining a large number of judges for sound statistical
72800	inference would require an effort incommensurate with the information
72900	yielded.
73000	
73100	6.8 ANALYSIS (4)  MULTIDIMENSIONAL EVALUATION 
73200		A more efficient and informative way to use Turing-like tests
73300	is  to  ask  judges  to  make  ratings  along  scaled dimensions from
73400	teletyped interviews. This might  be  called  asking  the  "dimension
73500	question".   One  can then compare scaled ratings of the patients and
73600	the model in order to determine precisely where and by how much  they
73700	differ.   In constructing our model we strove for one which exhibited
73800	indistinguishability along  some  dimensions  and  distinguishability
73900	along others. That is, we wanted the model to converge on what it was
74000	intended to simulate and to diverge from that which it was not.
74100		Paired-interview   transcripts   were  sent  to  another  400
74200	randomly selected psychiatrists asking them to rate the responses  of
74300	the two `patients' along multiple dimensions. The judges were divided
74400	into groups, each judge being asked to rate  responses  of  each  I-O
74500	pair  in  the  interviews along four dimensions.  The total number of
74600	dimensions in this test  was  twelve:   linguistic  noncomprehension,
74700	thought  disorder,  organic brain syndrome, bizarreness, anger, fear,
74800	ideas of reference, delusions, mistrust,  depression,  suspiciousness
74900	and  mania.  These are dimensions which psychiatrists commonly use in
75000	evaluating patients. There were three groups of  judges,  each  group
75100	being assigned 4 of the 12 dimensions.
75200	
75300			(INSERT TABLE 4 HERE)
75400	
75500		Table 4 shows there were significant differences, with  PARRY
75600	receiving   higher   scores   along   the  dimensions  of  linguistic
75700	noncomprehension, thought disorder, bizarreness, anger, mistrust  and
75800	suspiciousness.  On  the  delusion  dimension the patients were rated
75900	significantly higher.   There were no significant  differences  along
76000	the  dimensions  of organic brain syndrome, fear, ideas of reference,
76100	depression and mania.
76200		Whereas    tests   asking   the   machine-question   indicate
76300	indistinguishability at  the  gross  level,  a  study  of  the  finer
76400	structure  of  the  model's  behavior  through  ratings  along scaled
76500	dimensions shows significant differences between patients and  model.
76600	These differences are of help to us in suggesting what aspects of the
76700	model should be modified to improve its performance.   The  graph  of
76800	Fig.  2 shows that no modifications are necessary along the dimension
76900	of  "organic  brain  syndrome".   But  it  is  clear   that   PARRY'S
77000	language-comprehension  might  be improved. Then a future dimensional
77100	test would tell whether improvement had occurred  and  by  how  much.
77200	Successive  identification  of particular areas of failure provides a
77300	type of sensitivity analysis  which  makes  clear  what  improvements
77400	should be pursued in developing more adequate model versions.
77500	
77600		(INSERT FIG. 2 HERE)
77700	
77800	6.5 ANALYSIS (5)  A RANDOM MODEL 
77900		Further evidence that  the  machine-question  is  too  low  a
78000	hurdle  for a simulation model and too insensitive a test, comes from
78100	the following experiment.  In  this  test  we  constructed  a  random
78200	version  of  the paranoid model (RANDOM-PARRY) which utilized PARRY'S
78300	output statements, but expressed them randomly  independent  of  what
78400	the  interviewer  said.   Two psychiatrists conducted interviews with
78500	this model, transcripts of which were paired with patient  interviews
78600	and  sent  to  200  randomly  selected  psychiatrists asking both the
78700	machine-question and the dimension-question.   Of the 69  replies  to
78800	the  machine  question, 34 (49%) were right and 35 (51%) wrong. Based
78900	on this  random  sample  of  69  psychiatrists,  the  95%  confidence
79000	interval ranges from 39 to 63, again indicating chance guessing. When
79100	a poor model, such as a  random  one,  passes  a  test,  it  strongly
79200	suggests the test is weak.
79300	
79400		(INSERT TABLE 5 HERE)
79500	
79600		Although a distinction is not made when  the simple  machine-
79700	question is asked, definite distinctions ARE made when judgements are
79800	requested  along  specific  dimensions.    As  shown  in   Table   5,
79900	significant  differences  appear  along  the dimensions of linguistic
80000	noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
80100	rated  higher.   On  these  particular  dimensions we can construct a
80200	continuum in which the random version  represents  one  extreme,  the
80300	actual patients another. Nonrandom PARRY lies somewhere between these
80400	two extremes, indicating that it performs significantly  better  than
80500	the random version but still requires improvement before it can be
80600	considered   indistinguishable   from   patients  relative  to  these
80700	dimensions. Table 6 presents t values for  differences  between  mean
80800	ratings  of  PARRY  and  RANDOM-PARRY. (See Table 6 and Fig.2 for the
80900	mean ratings).
81000	
81100		(INSERT TABLE 6 AND FIG 2 HERE)
81200	
81300		These studies show that a more useful way to use  Turing-like
81400	indistinguishability  tests  is  to ask expert judges to make ratings
81500	along multiple dimensions deemed essential to the model.    Thus  the
81600	model  can  serve  as  an  instrument for its own perfection.  A good
81700	validation procedure has criteria for better or worse approximations.
81800	Useful  tests do not necessarily prove a model; they probe it for its
81900	strengths and weaknesses and clarify what is to be done next  in  the
82000	way  of  modification  and repair. Simply asking the machine-question
82100	yields little information relevant to what  the  model  builder  most
82200	wants  to know, namely, along which dimensions does the model need to
82300	be modified in order to effect an improvement in its performance?
82400	
82500		To  conclude,  it  is  perhaps  historically significant that
82600	these tests were conducted at all. To my knowledge, no  one  to  date
82700	has  subjected  an  interactive  simulation  model  of human symbolic
82800	processes to multidimensional indistinguishability tests. These tests
82900	set a precedent and provide a standard against which competing models
83000	might be measured.